Extending gpfdist in Cloudberry Database to Support SFTP Protocol for…#1226
Open
ZTE-EBASE wants to merge 28 commits intoapache:mainfrom
Open
Extending gpfdist in Cloudberry Database to Support SFTP Protocol for…#1226ZTE-EBASE wants to merge 28 commits intoapache:mainfrom
ZTE-EBASE wants to merge 28 commits intoapache:mainfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
… Data Ingestion
gpfdist is a file distribution program in Cloudberry that can parallel load external data into the database. However, it has the drawback that data files must reside on the same machine as the tool. Therefore,extending it to support the SFTP protocol can address the above drawback and enable loading files from a remote server.
Fixes #ISSUE_Number
What does this PR do?
By extending the
gpfdisttool to support the SFTP protocol, remote data loading has been achieved, overcoming the challenge of having the tool and data files on the same machine.Type of Change
New feature (non-breaking change)
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
The ssh2 library needs to be introduced during compilation and placed under
/usr/local.Checklist
Additional Context
Under this approach, the location template for the external table is:
Related Test Case:
1 Start gpfdist
2 create table (external)
3 data load
4 result
cat test.csv
1|ZTE-EBASE
2|ZTE-EBASE
3|ZTE-EBASE
4|ZTE-EBASE
5|ZTE-EBASE
6|ZTE-EBASE
7|ZTE-EBASE
8|ZTE-EBASE
9|ZTE-EBASE
10|ZTE-EBASE
The amount and content of the table data are consistent with the file.
CI Skip Instructions